Semi-Supervised Novelty Detection
نویسندگان
چکیده
A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection in this semi-supervised setting is naturally solved by a general reduction to a binary classification problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to Neyman-Pearson classification. Unlike the inductive approach, semi-supervised novelty detection (SSND) yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule. We validate the practical utility of SSND with an extensive experimental study. We also show that SSND provides distribution-free, learning-theoretic solutions to two well known problems in hypothesis testing. First, our results provide a general solution to the general two-sample problem, that is, the problem of determining whether two random samples arise from the same distribution. Second, a specialization of SSND coincides with the standard p-value approach to multiple testing under the so-called random effects model. Unlike standard rejection regions based on thresholded p-values, the general SSND framework allows for adaptation to arbitrary alternative distributions in multiple dimensions.
منابع مشابه
Semi-supervised Eigenbasis novelty detection
We present a semi-supervised online method for novelty detection and evaluate its performance for radio astronomy time series data. Our approach uses sparse, adaptive eigenbases to combine (1) prior knowledge about uninteresting signals with (2) online estimation of the current data properties to enable highly sensitive and precise detection of novel signals. We apply Semi-Supervised Eigenbasis...
متن کاملUse of Time-Aware Language Model in Entity Driven Filtering System
Tracking entities, so that new or important information about that entities are caught, is a real challenge and has many applications (e.g., information monitoring, marketing,...). We are interesting in how to represent an entity profile to fulfill two purposes: 1. entity detection and disambiguation, 2. novelty and importance quantification. We propose an entity profile, which uses two languag...
متن کاملSemi-Supervised Novelty Detection with Adaptive Eigenbases, and Application to Radio Transients
We present a semi-supervised online method for novelty detection and evaluate its performance for radio astronomy time series data. Our approach uses adaptive eigenbases to combine 1) prior knowledge about uninteresting signals with 2) online estimation of the current data properties to enable highly sensitive and precise detection of novel signals. We apply the method to the problem of detecti...
متن کاملMultiple Instance Learning with the Optimal Sub-Pattern Assignment Metric
Multiple instance data are sets or multi-sets of unordered elements. Using metrics or distances for sets, we propose an approach to several multiple instance learning tasks, such as clustering (unsupervised learning), classification (supervised learning), and novelty detection (semi-supervised learning). In particular, we introduce the Optimal Sub-Pattern Assignment metric to multiple instance ...
متن کاملAdversarially Learned One-Class Classifier for Novelty Detection
Novelty detection is the process of identifying the observation(s) that differ in some respect from the training observations (the target class). In reality, the novelty class is often absent during training, poorly sampled or not well defined. Therefore, one-class classifiers can efficiently model such problems. However, due to the unavailability of data from the novelty class, training an end...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 11 شماره
صفحات -
تاریخ انتشار 2010